An intelligent e-commerce dialogue agent system based on reinforcement learning, integrating ontology reasoning, business toolchains, dialogue memory, and a Gradio interface. It realizes closed-loop learning from data to training and then to deployment through the Stable Baselines3 PPO algorithm, and can autonomously optimize the decision-making strategy of the shopping assistant.